Inductive Lexica
نویسندگان
چکیده
Machine Learning techniques are useful tools for the automatic extension of existing lexical databases. In this paper, we review some symbolic machine learning methods which can be used to add new lexical material to the lexicon by automatically inducing the regularities implicit in lexical representations already present. We introduce the general methodology for the construction of inductive lex-ica, and discuss empirical results on extending lexica with two types of information: pronunciation and gender.
منابع مشابه
Transforming Lexica as Trees
We investigate the problem of structurally changing lexica, while preserving the information. We present a type of lexicon transformation that is complete on an interesting class of lexica. Our work is motivated by the problem of merging one or more lexica into one lexicon. Lexica, lexicon schemas, and lexicon transformations are all seen as particular kinds of trees.
متن کاملLarge lexica for speech-to-speech translation: from specification to creation
This paper presents the corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). These lexica will be specified, built and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during the years 2002-2005. Large lexic...
متن کاملxLiD-Lexica: Cross-lingual Linked Data Lexica
In this paper, we introduce our cross-lingual linked data lexica, called xLiD-Lexica, which are constructed by exploiting the multilingual Wikipedia and linked data resources from Linked Open Data (LOD). We provide the cross-lingual groundings of linked data resources from LOD as RDF data, which can be easily integrated into the LOD data sources. In addition, we build a SPARQL endpoint over our...
متن کاملLexicon and Corpora for Speech to Speech Translation (LC-STAR)
The objective of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) is corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). During the lifetime of the project (2002-2005) these lexica will be specified, built and validated. Large lexica co...
متن کاملCreation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
This paper presents specifications and requirements for creation and validation o f large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems . The prepared language resources are created and validated within the scope o f the EU-project LC-STAR (Lexica and Corpora for Speech-toSpeech Translation Component...
متن کامل